6-recurrent-networks-cover

Introduction

Hopfield networks, aka: recurrent networks, introduce feedback into the flow of data through a standard artificial neural network. In principle, this feedback mechanism was envisioned as a means of providing an ANN with a form of memory and, along with the falling cost of computing power, this concept reinvigorated the field of artificial intelligence in the late 1980s. Recurrent networks have had some success in pattern and speech recognition but In practice, it remains debatable how useful or practical these networks are in tackling real world problems and it is often more common to solve complex problems by using larger networks (taking advantage of the lower cost of computing power) or by using alternative learning mechanisms (such as reinforcement learning or deep learning, which we shall review later). However, in this section, we shall review how the recurrent network functions, as it provides a means of better understanding network behaviour and highlights some of its limitations.

Basic feedback operation

As noted, what Hopfield did was to add feedback connections into the standard feed forward flow of data through an ANN and by doing so Hopfield demonstrated that ANNs could achieve some interesting behaviours, in particular, they could act as if they had memory. There are many architectures for recurrent networks but in general, feedback is accomplished by making connections from the outputs of neurons back into their inputs as illustrated below:

Basic ANN with feedback

Figure 1: Basic ANN with feedback

Networks with such connections are called ‘feedback’ or ‘recurrent’ networks.

The network illustrated in Figure 1 operates in the same way as the feed forward networks introduced earlier: the neurons function in the same way, weights are applied to the input of neurons, and the arrows indicate connections through which data flows through the network. The network differs in that once data has been processed by the neurons and an output produced it is then fed back into the network as an updated set of inputs. These new input data are once again passed forward through the network resulting in a new set of outputs. This process is repeated and after some time, the outputs converge on a final value (i.e.: they do not continue to change on subsequent cycles). In this state, the network is said to have ‘relaxed’ and these outputs are considered the final state of the network. The relaxing process of a recurrent network is illustrated in the following flow chart:

Relaxing process in a recurrent network

Figure 2 Relaxing process in a Recurrent Network

When allowing a network to relax in this way, it begins to show an interesting capability: it can reconstruct a clean pattern from a corrupted aberration of an original:

Reconstruction of corrupted pattern

Figure 11: Reconstruction of corrupted pattern

This characteristic implies that the network has been able to store the correct (uncorrupted) pattern, i.e. the network has a memory. We shall illustrate this concept in the next section.

Training a recurrent network

There are a number of ways to train an artificial network to learn, such as the back propagation method introduced earlier. The recurrent network proposed by Hopfield was also trainable but used a very simple method. The method consisted of a single calculation per weight such that the whole network could be trained in just one pass. As such it is often termed the ‘one pass method’. This method is illustrated in the simple pattern recognition task that follows. Let us suppose that the network is to be trained to recognise two black & white patterns. These patterns can be represented in two bits as illustrated below:

Notes:

the original Hopfield network used similarly simple binate state neurons so that is what will be used in this example.
Instead of a simple binary input [ 1 0 ] this network functions using inputs [ +1 -1 ].
A threshold activated neuron model is used with the threshold value set to 0.

Pattern A and pattern B

The weights, that ensure that these patterns are stored in the network can then be calculated by following some rules:

Where a weight has equal indices, the weight is set to zero, e.g.:
W(1,1) = 0
Where a weight has unequal indices, the weight is the sum of the product of the outputs corresponding to the indices of the weight for each pattern to be trained, e.g.:
W(1,2) = $[O_{1} \times O_{2}]^{p a t t e r n A} + [O_{1} \times O_{2}]^{p a t t e r n B}$

These rules can be proven mathematically but doing so is not required here to demonstrate the functioning of the network.

Once the weights have been calculated, the network is trained to recognise these patterns. The network could then be fed a corrupted version of one pattern, and given time to relax, could reconstruct the original pattern. We shall demonstrate these concepts in the following worked examples.

Worked example 1 – Calculating the weights of recurrent network

Let’s say that we want to train the following three patterns:

Training patterns

We can ‘code’ these patterns to the following outputs:

coding patterns

To store these patterns, we require a 3 input / 3 neuron Hopfield Network:

Hopfield network

The weights of this network (that program the network to recognise the three patterns) can be calculated by following the rules above:

W(1,1) = $0$

W(1,2) = ${[O_{1} \times O_{2}]}^{A} + {[O_{1} \times O_{2}]}^{B} + {[O_{1} \times O_{2}]}^{C} = [- 1 \times - 1] + [+ 1 \times - 1] + [- 1 \times + 1] = - 1$

W(1,3) = ${[O_{1} \times O_{3}]}^{A} + {[O_{1} \times O_{3}]}^{B} + {[O_{1} \times O_{3}]}^{C} = [- 1 \times + 1] + [+ 1 \times - 1] + [- 1 \times + 1] = - 3$

W(2,1) = ${[O_{2} \times O_{1}]}^{A} + {[O_{2} \times O_{1}]}^{B} + {[O_{2} \times O_{2}]}^{C} = [- 1 \times - 1] + [- 1 \times - 1] + [+ 1 \times - 1] = - 1$

W(2,2) = $0$

W(2,3) = ${[O_{2} \times O_{3}]}^{A} + {[O_{2} \times O_{3}]}^{B} + {[O_{2} \times O_{3}]}^{C} = [- 1 \times + 1] + [- 1 \times - 1] + [+ 1 \times + 1] = + 1$

W(2,1) = ${[O_{3} \times O_{1}]}^{A} + {[O_{3} \times O_{1}]}^{B} + {[O_{3} \times O_{1}]}^{C} = [+ 1 \times - 1] + [- 1 \times + 1] + [+ 1 \times - 1] = - 3$

W(2,2) = ${[O_{3} \times O_{2}]}^{A} + {[O_{3} \times O_{2}]}^{B} + {[O_{3} \times O_{2}]}^{C} = [+ 1 \times - 1] + [- 1 \times - 1] + [+ 1 \times + 1] = + 1$

W(2,3) = $0$

To test this network, we can input one of the original patterns (e.g. Pattern C) and check the output matches the input:

-1+1+1

Define Inputs	Calculate Inputs to each Neuron		Calculate Outputs $φ^{threshold} ({n e t}_{j}, θ) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ \\ 0 & for {n e t}_{j} < θ \end{matrix}$
I₁ = -1 I₂ = +1 I₃ = +1	*W(1,1) = 0* *W(2,1) =-1* *W(3,1) = -3*	${n e t}_{N 1} = (I_{1} \times W (1, 1))$ $+ (I_{2} \times W (2, 1))$ $+ (I_{3} \times W (3, 1))$ ${n e t}_{N 1} = (- 1 \times 0) + (+ 1 \times - 1)$ $+ (+ 1 \times - 3) = - 4$	$O_{N 1} = φ^{t h r} ({n e t}_{N 1}, 0) = - 1$
I₁ = -1 I₂ = +1 I₃ = +1	*W(1,2) = -1* *W(2,2) = 0* *W(3,2) = +1*	${n e t}_{N 2} = (I_{1} \times W (1, 2))$ $+ (I_{2} \times W (2, 2))$ $+ (I_{3} \times W (3, 2))$ ${n e t}_{N 2} = (- 1 \times - 1) + (+ 1 \times 0)$ $+ (+ 1 \times + 1) = 0$	$O_{N 2} = φ^{t h r} ({n e t}_{N 1}, 0) = + 1$
I₁ = -1 I₂ = +1 I₃ = +1	*W(1,3) = -3* *W(2,3) = 1* *W(3,3) = 0*	${n e t}_{N 3} = (I_{1} \times W (1, 3))$ $+ (I_{2} \times W (2, 3))$ $+ (I_{3} \times W (3, 3))$ ${n e t}_{N 3} = (- 1 \times - 3) + (+ 1 \times + 1)$ $+ (+ 1 \times 0) = 4$	$O_{N 2} = φ^{t h r} ({n e t}_{N 1}, 0) = + 1$

We can also test this network’s capability to recognise a corrupted version of Pattern C:

-09+08508

Define Inputs	Calculate Inputs to each Neuron		Calculate Outputs $φ^{threshold} ({n e t}_{j}, θ) = \{\begin{matrix} 1 & for {n e t}_{j} \geq θ \\ 0 & for {n e t}_{j} < θ \end{matrix}$
I₁ = -0.9 I₂ = +0.85 I₃ = +0.8	*W(1,1) = 0* *W(2,1) =-1* *W(3,1) = -3*	${n e t}_{N 1} = (- 0.9 \times 0)$ $+ (+ 0.85 \times - 1)$ $+ (+ 0.8 \times - 3)$ $= - 3.25$	$O_{N 1} = φ^{t h r} ({n e t}_{N 1}, 0) = - 1$
I₁ = -0.9 I₂ = +0.85 I₃ = +0.8	*W(1,2) = -1* *W(2,2) = 0* *W(3,2) = +1*	${n e t}_{N 1} = (+ 0.85 \times - 1)$ $+ (+ 0.85 \times 0)$ $+ (+ 0.85 \times + 1)$ $= 1.7$	$O_{N 2} = φ^{t h r} ({n e t}_{N 1}, 0) = + 1$
I₁ = -0.9 I₂ = +0.85 I₃ = +0.8	*W(1,3) = -3* *W(2,3) = 1* *W(3,3) = 0*	${n e t}_{N 1} = (+ 0.8 \times - 3)$ $+ (0.8 \times + 1)$ $+ (+ 0.8 \times 0)$ $= 3.55$	$O_{N 2} = φ^{t h r} ({n e t}_{N 1}, 0) = + 1$

Summary

In summary:

In this section, we introduced the Hopfield Recurrent Neural Network and reviewed how data is processed by the network.
We then reviewed the basic mechanism and noted the concept of memory within these networks.
Finally, through a worked example, we illustrated how such a network may be used to reconstruct corrupted input patterns.

Recurrent Networks